4 research outputs found
Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start
Every day, thousands of users sign up as new Wikipedia contributors. Once
joined, these users have to decide which articles to contribute to, which users
to seek out and learn from or collaborate with, etc. Any such task is a hard
and potentially frustrating one given the sheer size of Wikipedia. Supporting
newcomers in their first steps by recommending articles they would enjoy
editing or editors they would enjoy collaborating with is thus a promising
route toward converting them into long-term contributors. Standard recommender
systems, however, rely on users' histories of previous interactions with the
platform. As such, these systems cannot make high-quality recommendations to
newcomers without any previous interactions -- the so-called cold-start
problem. The present paper addresses the cold-start problem on Wikipedia by
developing a method for automatically building short questionnaires that, when
completed by a newly registered Wikipedia user, can be used for a variety of
purposes, including article recommendations that can help new editors get
started. Our questionnaires are constructed based on the text of Wikipedia
articles as well as the history of contributions by the already onboarded
Wikipedia editors. We assess the quality of our questionnaire-based
recommendations in an offline evaluation using historical data, as well as an
online evaluation with hundreds of real Wikipedia newcomers, concluding that
our method provides cohesive, human-readable questions that perform well
against several baselines. By addressing the cold-start problem, this work can
help with the sustainable growth and maintenance of Wikipedia's diverse editor
community.Comment: Accepted at the 13th International AAAI Conference on Web and Social
Media (ICWSM-2019
Quantifying Engagement with Citations on Wikipedia
Wikipedia, the free online encyclopedia that anyone can edit, is one of the
most visited sites on the Web and a common source of information for many
users. As an encyclopedia, Wikipedia is not a source of original information,
but was conceived as a gateway to secondary sources: according to Wikipedia's
guidelines, facts must be backed up by reliable sources that reflect the full
spectrum of views on the topic. Although citations lie at the very heart of
Wikipedia, little is known about how users interact with them. To close this
gap, we built client-side instrumentation for logging all interactions with
links leading from English Wikipedia articles to cited references during one
month, and conducted the first analysis of readers' interaction with citations
on Wikipedia. We find that overall engagement with citations is low: about one
in 300 page views results in a reference click (0.29% overall; 0.56% on
desktop; 0.13% on mobile). Matched observational studies of the factors
associated with reference clicking reveal that clicks occur more frequently on
shorter pages and on pages of lower quality, suggesting that references are
consulted more commonly when Wikipedia itself does not contain the information
sought by the user. Moreover, we observe that recent content, open access
sources and references about life events (births, deaths, marriages, etc) are
particularly popular. Taken together, our findings open the door to a deeper
understanding of Wikipedia's role in a global information economy where
reliability is ever less certain, and source attribution ever more vital.Comment: The Web Conference WWW 2020, 10 page
Keeping Up with the Trends: Analyzing the Dynamics of Online Learning and Hiring Platforms in the Software Programming Domain
The Fourth Industrial Revolution has considerably sped up the pace of skill changes in many professional domains, with scores of new skills emerging and many old skills moving towards obsolescence. For these domains, identifying the new necessary skills in a timely manner is a difficult task, where existing methods are inadequate. Understanding the process, by which these new skills and technologies appear in and diffuse through a professional domain, could give training providers more time to identify these new skills and react. For this purpose, in the present work, we look at the dynamics between online learning platforms and online hiring platforms in the software programming profession, a rapidly evolving domain. To do so, we fuse four data sources together: Stack Overflow, an online community questions and answers (Q&A) platform; Google Trends, which provides online search trends from Google; Udemy, a platform offering skill-based Massively Open Online Courses (MOOCs) where anyone can create courses; and Stack Overflow Jobs, a job ad platform. We place these platforms along two axes: i) how much expertise it takes, on average, to create content on them, and ii) whether, in general, the decision to create content on them is made by individuals or by groups. Our results show that the topics under study have a systematic tendency to appear earlier on platforms where content creation requires (on average) less expertise and is done more individually, rather than by groups: Stack Overflow is found to be more agile than Udemy, which is itself more agile than Stack Overflow Jobs (Google Trends did not prove usable due to extreme data sparsity). However, our results also show that this tendency is not present for all new skills, and that the software programming profession as a whole is remarkably agile: there are usually only a few months between the first Stack Overflow appearance of a new topic, and its first appearance on Udemy or Stack Overflow Jobs. In addition, we find that Udemy's agility has dramatically increased over time. Our novel methodology is able to provide valuable insights into the dynamics between online education and job ad platforms, enabling training program creators to look at said dynamics for various topics and to understand the pace of change. This allows them to maintain better awareness of the trends and to prioritize their attention, both on the right topics and on the right platforms
Quantifying Engagement with Citations on Wikipedia
Wikipedia is one of the most visited sites on the Web and a common source of information for many users. As an encyclopedia, Wikipedia was not conceived as a source of original information, but as a gateway to secondary sources: according to Wikipedia's guidelines, facts must be backed up by reliable sources that reflect the full spectrum of views on the topic. Although citations lie at the heart of Wikipedia, little is known about how users interact with them. To close this gap, we built client-side instrumentation for logging all interactions with links leading from English Wikipedia articles to cited references during one month, and conducted the first analysis of readers' interactions with citations. We find that overall engagement with citations is low: about one in 300 page views results in a reference click (0.29% overall; 0.56% on desktop; 0.13% on mobile). Matched observational studies of the factors associated with reference clicking reveal that clicks occur more frequently on shorter pages and on pages of lower quality, suggesting that references are consulted more commonly when Wikipedia itself does not contain the information sought by the user. Moreover, we observe that recent content, open access sources, and references about life events (births, deaths, marriages, etc.) are particularly popular. Taken together, our findings deepen our understanding of Wikipedia's role in a global information economy where reliability is ever less certain, and source attribution ever more vital